Purpose

The ´fulltext´-package includes a lightweight htmlwidget for fulltext output.

Getting Started

library(janeaustenr)
library(tokenizers)
library(fulltext)
library(polmineR)
library(magrittr)
library(data.table)

From tidytext to fulltext

ftxt_list <- cut(1:length(emma), c(1, grep("^\\s*$", emma), length(emma))) %>%
  split(janeaustenr::emma, f = .) %>%
  lapply(paste, collapse = " ") %>%
  tokenizers::tokenize_words(lowercase = FALSE, strip_punct = FALSE) %>%
  as.fulltexttable() %>% 
  split(ftxt, column = "tag_before", regex = "<para") %>%
  retag(regex = "CHAPTER", old = "para", new = "h2") %>%
  as.fulltexttable() %>%
  split(column = "token", regex = "CHAPTER") %>%
  rename(name = sprintf("Chapter %d", seq_along(.)))
fulltext(ftxt_list[[1]])

From a subcorpus to fulltext

We introduce the fulltext package by example. In addition to the fulltext package, we need the polmineR package which includes the GERMAPARLMINI corpus.

library(polmineR)
use("polmineR")
## ... activating corpus: GERMAPARLMINI
## ... activating corpus: REUTERS

The example aims at outputting one particular speech. We take a speech held by Voker Kauder in the German Bundestag.

sp <- corpus("GERMAPARLMINI") %>%
  subset(speaker == "Volker Kauder") %>%
  subset(date == "2009-11-10")

The data that is passed to the JavaScript that generates the output. Expected to be a list of lists that provide data on sections of text. Each of the sub-lists is to be a named list of a character vector with the HTML element the section will be wrapped into, and a data.frame (or a list) with a column “token”, and a column “id”.

ftab <- as.fulltexttable(sp, headline = "Volker Kauder (CDU)", display = "block")

So let us see the result …

fulltext(ftab, box = TRUE)

Crosstalking

Preparations

ftxt_list <- lapply(
  setNames(names(ftxt_list), names(ftxt_list)),
  function(chapter) data.frame(ftxt_list[[chapter]], chapter = chapter)
)

So this is the result.

Perspectives

Enjoy!